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Abstract: We present a label-free, chemically-selective, quantitative 
imaging strategy to identify breast cancer and differentiate its subtypes 
using coherent anti-Stokes Raman scattering (CARS) microscopy. Human 
normal breast tissue, benign proliferative, as well as in situ and invasive 
carcinomas, were imaged ex vivo. Simply by visualizing cellular and tissue 
features appearing on CARS images, cancerous lesions can be readily 
separated from normal tissue and benign proliferative lesion. To further 
distinguish cancer subtypes, quantitative disease-related features, describing 
the geometry and distribution of cancer cell nuclei, were extracted and 
applied to a computerized classification system. The results show that in situ 
carcinoma was successfully distinguished from invasive carcinoma, while 
invasive ductal carcinoma (IDC) and invasive lobular carcinoma were also 
distinguished from each other. Furthermore, 80% of intermediate-grade IDC 
and 85% of high-grade IDC were correctly distinguished from each other. 
The proposed quantitative CARS imaging method has the potential to 
enable rapid diagnosis of breast cancer. 
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References and links 

1. M. Heron, "Deaths: leading causes for 2004," Natl. Vital Stat. Rep. 56(5), 1-95 (2007). 

2. American Cancer Society, Cancer Facts & Figures 2011 (American Cancer Society, Atlanta, GA, 201 1). 

3. M. J. Beresford, A. R. Padhani, N. J. Taylor, M. L. Ah-See, J. J. Stirling, A. Makris, J. A. d'Arcy, and D. J. 
Collins, "Inter- and intraobserver variability in the evaluation of dynamic breast cancer MRI," J. Magn. Reson. 
Imaging 24(6), 1316-1325 (2006). 

4. P. Robbins, S. Pinder, N. de Klerk, H. Dawkins, J. Harvey, G. Sterrett, I. Ellis, and C. Elston, "Histological 
grading of breast carcinomas: a study of interobserver agreement," Hum. Pathol. 26(8), 873-879 (1995). 

5. P. K. Gupta, S. K. Majumder, and A. Uppal, "Breast cancer diagnosis using N 2 laser excited autofluorescence 
spectroscopy," Lasers Surg. Med. 21(5), 417^122 (1997). 

6. S. K. Majumder, N. Ghosh, and P. K. Gupta, "N 2 laser excited autofluorescence spectroscopy of formalin-fixed 
human breast tissue," J. Photochem. Photobiol. B 81(1), 33^12 (2005). 

7. Y. Yang, A. Katz, E. J. Celmer, M. Zurawska-Szczepaniak, and R. R. Alfano, "Fundamental differences of 
excitation spectrum between malignant and benign breast tissues," Photochem. Photobiol. 66(4), 518-522 
(1997). 

8. A. S. Haka, K. E. Shafer-Peltier, M. Fitzmaurice, J. Crowe, R. R. Dasari, and M. S. Feld, "Diagnosing breast 
cancer by using Raman spectroscopy," Proc. Natl. Acad. Sci. U.S.A. 102(35), 12371-12376 (2005). 



#148131 - $15.00 USD Received 24 May 201 1; revised 24 Jun 201 1; accepted 29 Jun 201 1; published 5 Jul 201 1 
(C) 201 1 OSA 1 August 201 1 / Vol. 2, No. 8 / BIOMEDICAL OPTICS EXPRESS 2160 



9. A. S. Haka, Z. Volynskaya, J. A. Gardecki, J. Nazemi, J. Lyons, D. Hicks, M. Fitzmaurice, R. R. Dasari, J. P. 
Crowe, and M. S. Feld, "In vivo margin assessment during partial mastectomy breast surgery using raman 
spectroscopy," Cancer Res. 66(6), 3317-3322 (2006). 

10. F. T. Nguyen, A. M. Zysk, E. J. Chaney, J. G. Kotynek, U. J. Oliphant, F. J. Bellafiore, K. M. Rowland, P. A. 
Johnson, and S. A. Boppart, "Intraoperative evaluation of breast tumor margins with optical coherence 
tomography," Cancer Res. 69(22), 8790-8796 (2009). 

11. M. V. Chowdary, K. K. Kumar, J. Kurien, S. Mathew, and C. M. Krishna, "Discrimination of normal, benign, 
and malignant breast tissues by Raman spectroscopy," Biopolymers 83(5), 556-569 (2006). 

12. J. X. Cheng and X. S. Xie, "Coherent anti-Stokes Raman scattering microscopy: instrumentation, theory, and 
applications," J. Phys. Chem. B 108(3), 827-840 (2004). 

13. F. Ganikhanov, C. L. Evans, B. G. Saar, and X. S. Xie, "High-sensitivity vibrational imaging with frequency 
modulation coherent anti-Stokes Raman scattering (FM CARS) microscopy," Opt. Lett. 31(12), 1872-1874 
(2006). 

14. Z. Wang, Y. Yang, P. Luo, L. Gao, K. K. Wong, and S. T. C. Wong, "Delivery of picosecond lasers in 
multimode fibers for coherent anti-Stokes Raman scattering imaging," Opt. Express 18(12), 13017-13028 

(2010) . 

15. C. L. Evans and X. S. Xie, "Coherent anti-stokes Raman scattering microscopy: chemical imaging for biology 
and medicine," Annu Rev Anal Chem (Palo Alto Calif) 1(1), 883-909 (2008). 

16. C. L. Evans, E. O. Potma, and X. S. Xie, "Coherent anti-stokes raman scattering spectral interferometry: 
determination of the real and imaginary components of nonlinear susceptibility %(3) for vibrational microscopy," 
Opt. Lett. 29(24), 2923-2925 (2004). 

17. C. L. Evans, E. O. Potma, M. Puoris'haag, D. Cote, C. P. Lin, and X. S. Xie, "Chemical imaging of tissue in vivo 
with video-rate coherent anti-Stokes Raman scattering microscopy," Proc. Natl. Acad. Sci. U.S.A. 102(46), 
16807-16812(2005). 

18. R. Mouras, G. Rischitor, A. Downes, D. Salter, and A. Elfick, "Nonlinear optical microscopy for drug delivery 
monitoring and cancer tissue imaging," J. Raman Spectrosc. 41(8), 848-852 (2010). 

19. T. T. Le, T. B. Huff, and J. X. Cheng, "Coherent anti-Stokes Raman scattering imaging of lipids in cancer 
metastasis," BMC Cancer 9(1), 42 (2009). 

20. C. Krafft, A. A. Ramoji, C. Bielecki, N. Vogler, T. Meyer, D. Akimov, P. Rosch, M. Schmitt, B. Dietzek, I. 
Petersen, A. Stallmach, and J. Popp, "A comparative Raman and CARS imaging study of colon tissue," J 
Biophotonics 2(5), 303-312 (2009). 

21. X. Nan, J. X. Cheng, and X. S. Xie, "Vibrational imaging of lipid droplets in live fibroblast cells with coherent 
anti-Stokes Raman scattering microscopy," J. Lipid Res. 44(1 1), 2202-2208 (2003). 

22. M. Miiller and A. Zumbusch, "Coherent anti-Stokes Raman scattering microscopy," ChemPhysChem 8(15), 
2156-2170 (2007). 

23. T. B. Huff and J. X. Cheng, "In vivo coherent anti-Stokes Raman scattering imaging of sciatic nerve tissue," J. 
Microsc. 225(2), 175-182 (2007). 

24. C. L. Evans, X. Xu, S. Kesari, X. S. Xie, S. T. C. Wong, and G. S. Young, "Chemically-selective imaging of 
brain structures with CARS microscopy," Opt. Express 15(19), 12076-12087 (2007). 

25. J. X. Cheng, "Coherent anti-Stokes Raman scattering microscopy," Appl. Spectrosc. 61(9), 197-208 (2007). 

26. P. D. Chowdary, Z. Jiang, E. J. Chaney, W. A. Benalcazar, D. L. Marks, M. Gruebele, and S. A. Boppart, 
"Molecular histopathology by spectrally reconstructed nonlinear interferometric vibrational imaging," Cancer 
Res. 70(23), 9562-9569 (2010). 

27. V. Kumar, A. K. Abbas, N. Fausto, and J. C. Aster, Robbins and Cotran Pathologic Basis of Disease, 8th ed, 
(Saunders Elsevier, Philadelphia, PA, 2009). 

28. L. Gao, Y. Yang, J. Xing, M. J. Thrall, Z. Wang, F. Li, P. Luo, K. K. Wong, H. Zhao, and S. T. C. Wong, 
"Diagnosing lung cancer using coherent anti-Stokes Raman scattering microscopy," Proc. SPJE 7890, 789015 

(2011) . 

29. Y. Yang, L. Gao, Z. Wang, M. J. Thrall, P. Luo, K. K. Wong, and S. T. C. Wong, "Label-free imaging of human 
breast tissues using coherent anti-Stokes Raman scattering microscopy," Proc. SPJE 7903, 79032G, 79032G-6 
(2011). 

30. L. Gao, H. Zhou, M. J. Thrall, F. Li, Y. Yang, Z. Wang, P. Luo, K. K. Wong, G. S. Palapattu, and S. T. C. Wong, 
"Label-free high-resolution imaging of prostate glands and cavernous nerves using coherent anti-Stokes Raman 
scattering microscopy," Biomed. Opt. Express 2(4), 915-926 (2011). 

31. S. Beucher, "The watershed transformation applied to image segmentation," Scanning. Microsc. Int. 6, 299-314 
(1992). 

32. L. Vincent and P. Soille, "Watersheds in digital spaces: an efficient algorithm based on immersion simulations," 
IEEE Trans. Pattern Anal. Mach. Intell. 13(6), 583-598 (1991). 

33. M. Wang, X. Zhou, F. Li, J. Huckins, R. W. King, and S. T. C. Wong, "Novel cell segmentation and online SVM 
for cell cycle phase identification in automated microscopy," Bioinformatics 24(1), 94-101 (2008). 

34. W. Gander, G. H. Golub, and R. Strebel, "Least-squares fitting of circles and ellipses," BIT Numer. Math. 34(4), 
558-578 (1994). 

35. T. Jones, A. Carpenter, and P. Golland, "Voronoi-based segmentation of cells on image manifolds," Led. Notes 
Comput. Sci. 3765, 535-543 (2005). 



#148131 - $15.00 USD Received 24 May 201 1; revised 24 Jun 201 1; accepted 29 Jun 2011; published 5 Jul 201 1 
(C) 201 1 OSA 1 August 201 1 / Vol. 2, No. 8 / BIOMEDICAL OPTICS EXPRESS 2161 



36. G. Voronoi, "Nouvelles applications des parametres continus a la theorie des formes quadratiques," J. Reine 
Angew. Math. 133, 97-178 (1907). 

37. F. Li, X. Zhou, J. Ma, and S. T. C. Wong, "Multiple nuclei tracking using integer programming for quantitative 
cancer cell cycle analysis," IEEE Trans. Med. Imaging 29(1), 96-105 (2010). 

38. J. O'Rourke, Computational Geometry in C, 2nd ed. (Cambridge University Press, NewYork, 1998). 

39. P. Geladi and B. R. Kowalski, "Partial least-squares regression: a tutorial," Anal. Chim. Acta 185(1), 1-17 
(1986). 

40. H. Abdi, "Partial least squares (PLS) regression," in. Encyclopedia for Research Methods for the social Sciences, 
M. Lewis-Beck, A. Bryman, and T. Futing, eds. (Sage, Thousand Oaks, CA, 2003). 

41. I. S. Helland, "Partial least squares regression and statistical models," Scand. J. Stat. 17, 97—1 14 (1990). 

42. A. Hoskuldsson, "PLS regression methods," I. Chemometr. 2(3), 211-228 (1988). 

43. D. Zhou, O. Bousquet, and T. Lai, J. Weston, and B. SchoTkopf, "Learning with local and global consistency," 
Adv.Neural Inform. Proc. Syst. 16, 321-328 (2004). 

44. J. Wang, S. F. Chang, X. Zhou, and S. T. C. Wong, "Active microscopic cellular image annotation by 
superposable graph transduction with imbalanced labels," in IEEE Conference on Computer Vision and Pattern 
Recognition, 2008. CVPR 2008 (IEEE, 2008), pp. 1-8. 

45. B. C. Pestalozzi, "Portrait of invasive lobular carcinoma of the breast," Eur. J. Cancer 45(Suppl 1), 450-451 
(2009). 

46. D. Dian, H. Herold, I. Mylonas, C. Scholz, W. Janni, H. Sommer, and K. Friese, "Survival analysis between 
patients with invasive ductal and invasive lobular breast cancer," Arch. Gynecol. Obstet. 279(1), 23-28 (2009). 

47. B. C. Pestalozzi, D. Zahrieh, E. Mallon, B. A. Gusterson, K. N. Price, R. D. Gelber, S. B. Holmberg, I. Lindtner, 
R. Snyder, B. Thurlimann, E. Murray, G. Viale, M. Castiglione-Gertsch, A. S. Coates, and A. Goldhirsch; 
International Breast Cancer Study Group, "Distinct clinical and prognostic features of infiltrating lobular 
carcinoma of the breast: combined results of 15 International Breast Cancer Study Group clinical trials," J. Clin. 
Oncol. 26(18), 3006-3014 (2008). 

48. E. A. Rakha, M. E. El-Sayed, D. G. Powe, A. R. Green, H. Habashy, M. J. Grainge, J. F. Robertson, R. Blarney, 
J. Gee, R. I. Nicholson, A. H. Lee, and I. O. Ellis, "Invasive lobular carcinoma of the breast: response to 
hormonal therapy and outcomes," Eur. J. Cancer 44(1), 73-83 (2008). 

49. C. Zhou, D. W. Cohen, Y. Wang, H. C. Lee, A. E. Mondelblatt, T. H. Tsai, A. D. Aguirre, I. G. Fujimoto, and J. 
L. Connolly, "Integrated optical coherence tomography and microscopy for ex vivo multiscale evaluation of 
human breast tissues," Cancer Res. 70(24), 10071-10079 (2010). 



1. Introduction 

Breast cancer is the second leading cause of cancer-related deaths in women (1 in 8 women; 
about 13%) and accounts for approximately one-third of all cancers diagnosed among women 
in the United States [1]. The American Cancer Society estimated 230,480 new cases of 
invasive breast cancer and 57,650 new cases of in situ breast cancer, as well as approximately 
39,520 breast cancer-related deaths, in women in 201 1 [2]. Therapeutic decisions are based on 
imaging studies and pathologic diagnosis, neither of which has perfect sensitivity or 
specificity [3,4]. As the gold standard for clinical diagnosis, surgical pathology examines 
multiple histological features of tissues or cells (cell size, shape, and density or the formation 
of specific patterns) removed by surgeons or radiologists to characterize cancer lesions and 
their subtypes. The diagnostic process usually begins with a breast biopsy of either abnormal 
calcification or mass lesion, which is often performed by open surgery that removes the entire 
lesion, or by minimally-invasive core-needle biopsy that removes 5-12 cores of tissues to 
ensure adequate sampling. The excised tissues are then fixed, sliced, stained, and finally 
examined under a microscope by pathologists to make a diagnosis, resulting in a turnaround 
time ranging from hours to days. Frozen sections are more rapid, but are usually not 
performed on breast specimens because fatty tissue does not perform well in this technique. 
As a result of the long turnaround time for conventional histology, another procedure is often 
necessary because biopsies need to be repeated or margins need to be re-excised. Resulting 
delays or misdiagnosis in this process could directly lead to a missed opportunity to treat 
lesions early or unnecessarily aggressive therapies with harmful side-effects. Since diagnosis 
of cancer lesions plays a critical role in breast cancer prevention and treatments, a more rapid 
diagnostic technique could potentially reduce the number of repeated procedures while 
facilitating the whole process by allowing on-the-spot recognition of inadequate biopsies or 
positive margins. 
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In light of this, a variety of optical imaging techniques, such as fluorescence and Raman 
spectroscopies, have been explored to improve breast cancer diagnosis. Fluorescence 
spectroscopy has been demonstrated as a useful tool in breast disease correlations through ex 
vivo imaging experiments [5-7]. Although fluorescence imaging provides relatively high 
signal-to-background ratio, the small number of endogenous fluorophores in breast tissue and 
their overlapping spectra limit its applications [8]. Raman spectroscopy is another modality 
that has been investigated for disease diagnosis. It functions to identify disease lesions by 
capturing intrinsic chemical changes within tissues [8]. Previous study has successfully 
demonstrated its usefulness in identifying carcinomas by having a sensitivity of 94%, a 
specificity of 96% and an overall accuracy of 86% [9]. However, this technique is limited by 
its long acquisition time (> 1 s/pixel) with high excitation power, preventing its applications 
from fast scanning of large surface areas with high spatial resolution [10]. Collectively, then, 
there is considerable interest in developing a fast, less invasive, and more objective method 
for the screening and diagnosis of breast cancer [11]. 

As a molecular imaging technique, coherent anti-Stokes Raman scattering (CARS) 
microscopy has been demonstrated as a powerful tool for label-free imaging with sub- 
wavelength spatial resolution [12-15]. CARS imaging formulates contrast by probing 
resonances from specific chemical bonds in unstained samples, enabling its chemical 
selectivity. Its coherent nature further renders CARS signal several orders of magnitude 
stronger than the conventional Raman signal, thus offering video-rate imaging speed [16,17]. 
Therefore, this imaging modality has been successfully applied to a variety of biomedical 
applications, including the imaging of viruses, cells, tissues and live animals, as well as drug 
delivery [12,18-25]. In the field of cancer detection, a recent study showed the use of 
multiplex CARS for interferometric imaging of breast cancer for identification of cancer 
margins [26]. In this study, breast tissues were evaluated using their spectrum profile for 
construction of a digitized image for identification of tumor boundaries. The strategy was 
based on the chemically-selective modality of the CARS technique, but did not use its high 
spatial resolution in capturing cellular structures. 

Current pathology examination of stained breast biopsy samples focuses on changes in 
such cellular and histological features as cell size, cell-cell distance, and formation of fibrous 
structures [27]. Accurate identification of these features will lead to delineating the type of 
lesions for definitive treatment. However, conventional pathology examination is still subject 
to interobserver variations [4]. The CARS technique provides high-resolution images which 
can clearly detect individual cells without using any exogenous agent to stain tissue. 
Therefore, we hypothesized that a cell/tissue pattern recognition method could be developed 
using established pathological workup and diagnostic features as a basis for the quantitative 
classification of different types of breast lesions, leading, in turn, to a fast examination 
strategy for the analysis of breast cancer samples. Accordingly, in this study, such disease- 
related features as cell size, cell-cell distance and presence of fatty and fibrous structures were 
used for classification analysis. Cancerous lesions were initially separated from normal tissue 
and benign proliferative lesion using visual features such as the presence of fatty and fibrous 
structures. To further separate cancer subtypes, cellular features related to the morphology and 
distribution of cancer cell nucleus were extracted from ex vivo CARS images of human breast 
tissues, including ductal carcinoma in situ (DCIS), invasive ductal carcinoma (IDC), and 
invasive lobular carcinoma (ILC), and used to quantitatively characterize different cancer 
subtypes through a classification strategy based on machine learning techniques. To the best 
of our knowledge, this pilot study demonstrates the first diagnostic platform with label-free 
and fast imaging properties with the potential to distinguish breast cancer from normal and 
benign tissues on the basis of quantitative cellular and tissue features applied to a 
computerized classification system. 
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2. Materials and methods 



2.1. CARS microscopy 

As shown in Fig. 1, the schematic of our CARS microscopy consists of a laser source and a 
beam-scanning microscope. A mode-locked Nd:YVC»4 laser (High-Q Laser, Hohenems, 
Austria) provides a 7-ps pulse train at 1064 nm and a frequency-doubled, 5-ps pulse train at 
532 nm with repetition rate of 76 MHz. The 1064 nm pulse train is used as the Stokes wave 
for CARS process, while the 532 nm pulse train is used to pump an optical parametric 
oscillator (OPO) (Levante Emerald, APE-Berlin, Germany). The OPO generates a 5-ps pulse 
train output, which has a tunable wavelength range of 680-980 nm and is used as the pump 
wave for CARS process. The pump and Stokes beams are overlapped by adjusting a time- 
delay line in temporal domain and by the long -pass dichroic mirror DM1 (ql0201pxr, Chroma, 
VT) in spatial domain to satisfy the precondition for producing a CARS signal. The scanning 
microscope is modified from an Olympus FV300 confocal microscope adopting a 2D 
galvanometer. A red-light-sensitive photomultiplier tube (R3896, Hamamatsu, Japan) is used 
as the detector, which can sensitively detect the major spectral range of our CARS emission. 
A 60X, 1.2-NA water-immersion microscope objective (IR UPlanApo, Olympus) was used 
for this study. The lateral and axial resolution is estimated to be approximately 0.4 |jm and 0.9 
um, respectively [28]. 



Laser source Scanning microscope 




Fig. 1. Schematic of CARS microscopy. M: mirror, OPO: optical parametric oscillator, DL: 
delay line, DM: dichroic mirror, L: lens, MO: microscope objective, BT: breast tissue, BPF: 
band-pass filter, PMT: photomultiplier tube. 

2.2. Sample preparation 

Breast tissues were obtained from female patients undergoing surgical biopsy and surgery at 
The Methodist Hospital (TMH), Houston, TX, following Office of Human Subjects Research 
approval from The Methodist Hospital Research Institute (TMHRI). The excised tissues were 
immediately snap-frozen in liquid nitrogen for storage. A total of nineteen patients were 
enrolled in this study, including 4 cases of fibroadenoma, 2 cases of DCIS, 8 cases of IDC (2 
cases of intermediate-grade (IG-) IDC and 6 cases of high-grade (HG-) IDC), and 5 cases of 
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ILC. Twelve normal tissue samples were also collected from the same patients with IDC and 
ILC. Frozen tissue samples were thawed at room temperature and then imaged ex vivo using 
the CARS microscope. Two to five sampling points were imaged for each specimen and a 
total of 48 sampling points were examined (9 from DCIS, 11 from IG-IDC, 17 from HG-IDC 
and 1 1 from ILC patients). At each sampling point, three images were acquired from different 
imaging depths, resulting in a total of 144 images were used in this study. After CARS 
imaging, imaged places were marked with India blue, and samples were fixed in buffered 
formalin, sliced 5 -urn thick, and finally stained with H&E, as a standard control of the disease 
type. 

2.3. Image acquisition 

Tissue samples were placed on a 170-|im cover slip and inverted on a rubber ring to form a 
sample chamber to avoid possible compression and associated morphologic changes of tissues 
during imaging [29]. The pump wavelength was tuned to 816.8 nm and the Stokes wavelength 
was fixed at 1,064 nm to reach a beating frequency of 2,845 cm -1 , probing the symmetric CH 2 
stretching band. The CARS signal at 663 nm was collected by the same objective, i.e., using a 
backward (Epi-) detection scheme. Then it was separated from the excitation waves by the 
long-pass dichroic mirror DM2 (770dcxxr, Chroma, VT). Unwanted residual signals were 
blocked using a band-pass filter (BPF) (hq660/40m-2p, Chroma, VT). Image processing was 
performed using the Olympus Fluoview V5.0 software. Z-stack images with 1-um step size 
were acquired at two digital zooms. Low power views at zoom 1.5X (-0.30 am/pixel) with 
overall architectural information would allow us to clearly observe morphological features, 
while high power views at zoom 3. OX (-0.15 um/pixel) with detailed cellular information 
would be used for precise segmentation of cells. Average power on sample was -70 mW and 
-35 mW for the pump and Stokes beams, respectively. This power combination is higher than 
that typically used for CARS imaging. It is due to the fact that solid tumor tissues normally 
possess a lower lipid level than normal tissues. As a result, a higher excitation power is 
required to provide enough image contrast for observation of cellular structures in tumor 
tissues. The acquisition time was about 4 seconds per frame with 512 x 512 pixels. Bright- 
field images of their corresponding H&E slides were captured by using an Olympus BX51 
microscope and examined by a pathologist to determine the type of lesions as a standard 
control. 

2.4. Quantitative image analysis 

Cell nucleus segmentation: A semi -automated segmentation algorithm was developed to 
accurately delineate boundaries of cell nuclei [30]. The process consists of one manual step 
and four automated steps which take approximately 5 minutes for images from each patient. 
1) Manually select a point within a cell nucleus. 2) Crop an image patch (a square window 
containing the interested object) centered at the selected point containing the nucleus. 3) 
Apply a seeded watershed algorithm [31-33] to segment the image patch into background and 
foreground, and thus obtain a rough region of the cell nucleus. 4) Use intensity threshold [m- 
1.75* 8, m + 1.75* d] for identification of another rough nucleus region, where m and 3 are the 
average intensity and standard deviation in a neighborhood of the center point. 5) Delineate 
the nuclear regions by overlapping the rough regions obtained from watershed and threshold 
methods (steps 3 and 4), and fit the result with an ellipse using the least square criterion [34] 
for a final boundary. An average of 27 cells for DCIS, 18 cells for IG-IDC, 16 cells for HG- 
IDC, and 12 cells for ILC per image were used for parametrical calculation. Figure 2(B) 
provides an illustration of cell nuclear segmentation of an IDC sample shown in Fig. 2(A). 
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Fig. 2. A CARS image (A) from the Z-stacks of an IDC sample, showing cell nuclei 
segmentation (B), Voronoi tessellation (C), and Delaunay triangulation (D). Image size: 120 x 
120 Mm 2 . 

Validation of cell nuclei segmentation: One hundred cells were randomly selected from 10 
CARS images for each subtype of four breast cancer lesions: DCIS, IG-IDC, HG-IDC, and 
ILC, to validate the semi-automated segmentation algorithm. The semi -automated 
segmentation results were compared with the manual segmentation results by calculating three 
scores: precision, recall and f-score. They are given as follows: p = (S t flSA/Sj , 

r ={S i Pi SA/ S t , / = (2x px r)/(r + p) , where 5, is the ground truth (manual 

segmentation result) of the z'-th cell manually measured, and S . is the semi-automated 

segmentation result measured by the software used in this study. Figure 3 shows the validation 
results of cell segmentation in terms of precision, recall and fscore. It can be seen that all three 
indexes are close to 90% for the 400 individual nuclei from four cancer subtypes, indicating 
the high accuracy of our cell nuclear segmentation algorithm. 

2.5. Extraction of disease-related features 

We designated seven pathological features to characterize the difference among breast cancer 
subtypes: nuclear size, lengths of major (long) and minor (short) axes of cell nucleus, Voronoi 
tessellation (Fig. 2(C)) size (approximation of cell size) [35,36], as well as average, major 
(maximum) and minor (minimum) neighbor distances of cells in the Delaunay triangulation 
graph (Fig. 2(D)) [37,38]. Moreover, five parameters, including mean value, standard 
deviation, skewness, kurtosis, and entropy, were employed to evaluate the distribution of each 
feature. The mathematical definitions of skewness y\(x), kurtosis y%(x), and entropy H(x) are as 

follows: y t (x) = j [(x- ju)/af p{x)dx , y 2 (x) = j (x- /uf p(x)dx j J (x- /j) 2 p(x)dx -3 

rb 

and H{x) = —\ \og 2 {p{x))p{x)dx , where x is a random variable whose observations are 

Ja 

within [a, b], p(x) is the probability density function of x, and p. and a denote the mean and 
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standard deviation of x, respectively. Consequently, a total of thirty-five features were 
extracted to describe each CARS image. 




Precision Recall Fscore 



Fig. 3. Validation results of cell nuclear segmentation. DCIS: ductal carcinoma in situ, IG-IDC: 
intermediate-grade invasive ductal carcinoma, HG-IDC: High-grade invasive ductal carcinoma, 
ILC: invasive lobular carcinoma. 

2.6. Differential diagnosis analysis 

We performed two analyses to investigate the separation of cancer subtypes: partial least 
square regression (PLSR) [3SM-2] and semi-supervised learning (SSL) classification. PLSR is 
used for 3-D data visualization, and SSL is used to classify different subtypes of breast 
lesions. The basic idea of PLSR is to build a regression prediction model between the 
observation variables X (independent variables) and the dependent variables Y [39^2]. Since 
there might be many independent variables (when the dimension of X is high), it will make the 
prediction model complicated and sensitive to the noise [39^2]. To overcome this problem, 
the PLSR approach functions to reduce a high dimension independent variable space into a 
lower dimension space (dimension reduction), which can be represented by only a few 
coordinates (latent components). Then a linear regression prediction model is built between 
the latent components (by projecting the data points from the high dimension space onto the 
latent components to obtain new coordinates of the data points in the new space) and the 
dependent variables. So, technically, the PLSR analysis is to predict dependent variables Y 
from independent variables X by extending the idea of principal component analysis (PCA) 
[39—42]. The detailed implementation of PLSR can be found in [40]. In brief, this algorithm 
employs weight vectors c and w to maximize the correlation [cov(«,f)] 2 = [cov(Fc, Xw)] 2 , 

where u = Yc and t = Xw are called score vectors for Y and X, respectively. After obtaining the 
z'-th score vectors u, and f, (the projection coordinates of X and Ton the z'-th latent components, 
i.e. c ; and w,) the process is then applied to the residual matrixes Y, and X, to get the next set of 
score vectors u M and where Y t =Y t _ 1 —u i qJ , X ( = X f _j — t f pf , and p. =Xf_ l tJtft i , 
q. = Y* 1 u i /uJu j . 

After data visualization using PLSR, SSL classification analysis [43,44] was performed to 
separate cancer subtypes. The idea of SSL is to make use of both the training data and the data 
structure information embedded in the unlabeled data. SSL is straightforwardly used to 
smooth classification results. In other words, SSL prefers that the nearby samples should 
belong to the same class, and the labeled samples transfer their label information outward to 
their nearby unlabeled neighbors gradually layer by layer. An intuitive example is the two 
moons shape data provided in [43,44]. Mathematically, the processes of SSL are as follows. 
Given m are data points X = \x l ,x 2 ,...,x m j which belong to c class ( C = {l,2,...,cj ). The first 
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mi data points are labeled as y l . e C , and the other data points are unlabeled. SSL analysis 
finds a non-negative matrix F mc , which will be used to generate the labels of unlabeled data 
points, such as y i = argmax jSc F i} . The cost function of F mc is defined as follows [43,44]: 



2 



1 ( m - ||2 m 

)(-||x j -x.|| 2 /2CT 2 ), 



(1) 



where W tj = < \ / is an affinity matrix, D is a diagonal matrix with 

i = J- 

in 

Z> = z_W ti , F. and F j are the z'-th and y-th row vectors of F, Y t is the z'-th row vector of 

IX a = 

y. = <^ , and Y = Ofor the unlabeled data points. On the right-hand side of the above 

[0, y t *■ j. 

equation, the first term is the smoothness function which requires the neighboring data points 
to belong to the same class, while the second term is the fitting function which limits the 
labeled data points in order to be consistent with their original labels. Then the optimal F* 
satisfies F* = argmax F $>(F) . Differentiating ^(F) at F* , the equation 

(dO/8F)\ F _ Ft = F*-SF*+ju(F*-Y) = 0 can be obtained, where S = D y2 WD 1 ' 2 . 

Therefore, F*-SF + -juY/(l + ju) = 0 . Letting a = \l{\ + u) and = + , the 

relation F* = 0(1 -aS)~' Y can thus be obtained [43,44]. 

To validate the classification algorithm, a leave-one (patient)-out cross-validation analysis 
was conducted. In this process, the data from one of the patients were used for testing, while 
the remaining patients' data were used to train the classifier. Since three z-stack CARS images 
were captured for each sample, we used the voting method to manage the conflicting results 
among individual stacks. The sample's subtype was determined according to the classification 
results of the majority of the z-stacks. For example, if two of three z-stacks were classified 
into the same class, this sample would be recognized into that class regardless of the result 
from the third stack. 

3. Results 

3.1. CARS Images of breast tissues 

Figure 4 shows a comparison of CARS images of normal breast tissue, benign proliferative 
lesion and carcinomas with their H&E stained photomicrographic images. On H&E stained 
images, normal breast tissues predominantly consist of adipose and fibrous structures (Figs. 
4(B) and 4(D)). These structures possess strong CARS signals and can be clearly recognized 
in CARS images, as shown in Figs. 4(A) and 4(C), respectively. No obvious cells were 
identified in the normal tissues with CARS, possibly because of the overwhelming CARS 
signals from the fat and fibrous tissue components. Fibroadenoma is a common benign 
biphasic fibroepithelial tumor. One of its unique features is the intracanalicular pattern, in 
which the compressed duct shows linear branching pattern with slit-like lumen, as indicated 
by the arrow in the H&E stained image shown in Fig. 4(F). The same pattern is also clearly 
identified in the CARS image, as indicated by the arrow in Fig. 4(E). 

Figure 4(H) shows the H&E stained image of a solid subtype of DCIS. The tumor cells are 
confined within the basement membrane and nearly fill the entire duct space. There are 
prominent cytoplasmic borders with sharp outline. These features are also presented in the 
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CARS image shown as Fig. 4(G). Similarly, IG-IDC consists of tumor cells growing in cords, 
nests, tubules, and anastomosing cell clusters, invading into the surrounding stroma, as shown 
in Fig. 4(J). All these features are distinctly observed in the CARS image shown in Fig. 4(1). 
The CARS and H&E stained images of HG-IDC are shown as Figs. 4(K) and 4(L), 
respectively, in which tumor cells are arranged singly or in small clusters, but without 
noticeable tubule or gland formation. Figure 4(N) shows an H&E stained image of a classic 
ILC, with characteristic infiltrative pattern with single or rows of cells (Indian filing) invading 
into the stroma. This pattern is clearly presented in the CARS image, as shown in Fig. 4(M). It 
is worth noting that in some foci, the single filing infiltrative pattern of ILC is inconspicuous, 
and the tumor cells may just be dispersed in the stroma in an irregular fashion. 




Fig. 4. CARS images of human breast tissues taken at Raman shift of 2845 cm -1 and their H&E 
stained images from similar regions. Images of (A) adipose and (C) fibrous structures in 
normal tissues and their H&E stained images (B) and (D). Image (E) of a kind of fibroadenoma 
(a benign lesion) and its H&E stained image (F), in which a compressed duct can be clearly 
seen as a linear branching pattern with slit-like lumen, as indicated by the arrow. Image (G) of 
DCIS and its H&E stained image (H). Images of (I) IG-IDC and (K) HG-IDC and their H&E 
stained images (J) and (L). Image (M) of ILC and its H&E stained image (N). Scale bars: 10 
um. 
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Fig. 6. Spatial distributions of four subtypes of breast cancer using the 35-feature set under 
PLSR analysis. 

3.2. Differential diagnosis of breast cancer 

In most cases, histological evaluation alone is sufficient to separate in situ carcinoma from 
invasive carcinomas and to separate ductal and lobular carcinoma subtypes. However, in some 
instances, the differential diagnosis of histological subtype of breast cancer may be difficult 
and cannot be reliably made with conventional H&E staining of histological sections, even by 
an experienced pathologist. Therefore, using breast tissue samples with histologically well- 
characterized lesions by H&E staining, we explored whether an algorithm could reproduce 
identical or near-identical morphological characterizations using CARS images. Figure 5 
provides the distributions of the seven features for four subtypes of breast cancer. From Fig. 
5(A), 5(B) and 5(C), it can be seen that IG-IDC has larger nuclear size and longer major and 
minor radii with wider distribution ranges than other subtypes. HG-IDC has narrower 
distribution range while IG-IDC has wider distribution range than other subtypes in Voronoi 
tessellation size, as shown in Fig. 5(D). IG-IDC and ILC have longer average and minor 
neighbor distances with narrower distribution ranges than other subtypes, as shown in Fig. 
5(E) and 5(G). IG-IDC has longer major neighbor distance with a wider distribution range 
than other subtypes, as shown in Fig. 5(F). Figure 6 shows the global spatial distributions of 
four subtypes of breast cancer using PLSR analysis. Here, it can be visually seen that DCIS is 
mostly separated from IG-IDC and ILC but partially overlaps with HG-IDC, and ILC is well 
separated from other subtypes while IG-IDC and HG-IDC have partial overlapping. 

The quantitative analytical results of differential diagnosis of breast cancer subtypes are 
listed in Table 1, while the classification overview is illustrated in Fig. 7. The accuracies of 
separating in situ carcinoma from invasive carcinoma are shown in Table 1(A). While 100% 
of the in situ carcinoma is correctly identified and 18% of the invasive carcinomas are 
erroneously classified as in situ carcinoma, having an overall accuracy of 92%. This result 
could be visualized in the 3-D distribution of these cases in Fig. 6. Based on this result, a 
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classification algorithm was developed to separate DCIS from IG-IDC and ILC, and the 
results are shown in Table 1(B). By this algorithm, 96% of DCIS and 95% of IG-IDC and ILC 
samples are correctly classified with an overall accuracy of 96%. The accuracies of separating 
IDC from ILC are shown in Table 1(C). They are 100% separated from each other, which is 
also illustrated in the 3-D visualization results in Fig. 6. As shown in Table 1(D), 80% of IG- 
IDC and 85% of HG-IDC were correctly separated with an overall accuracy of 83%. 
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Fig. 7. Overview of the classification scheme. 



Table 1. Classification accuracy of separating cancer subtypes from each other (Accuracy 
= (true positive + true negative) / total testing samples) 
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4. Discussion 

In this exploratory study, we have demonstrated the feasibility of using CARS microscopy to 
distinguish breast cancer from normal tissue and benign proliferative lesion, as well as 
different cancer subtypes. High quality ex vivo images were obtained for normal, benign 
(fibroadenoma), DCIS, IDC and ILC breast tissues by using a custom-built CARS 
microscope. Our results show that CARS microscopy is capable of characterizing breast tissue 
structures and cell types in a manner similar to H&E staining of conventional histological 
sections. On CARS images, normal breast tissues present predominantly adipose and fibrous 
structures, while fibroadenomas possess unique morphological features in accordance with 
pathological criteria. On the other hand, cancer tissues exhibit distinct cellular features with 
high cellularity. These disease -related features can be used to distinguish cancer lesions from 
normal and benign tissues. In addition, the cells of different cancer subtypes also present 
unique features, e.g., the cords and tubules for IG-IDC, the solid pattern for HG-IDC, and the 
single filing pattern for ILC. CARS microscopy was also shown to discriminate these cellular 
features to further separate cancer subtypes. A computerized platform was developed to 
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perform nuclear segmentation and classification of different types and subtypes of breast 
lesions. Our results showed a good distinction of cancer from normal tissues and benign 
lesions, as well as cancer subtypes. Compared to H&E analysis, however, our approach 
presents a much faster strategy and eliminates the need for sample processing and the use of 
exogenous contrast agents, thus significantly reducing diagnostic time. The separation of 
some breast lesions, such as atypical ductal hyperplasia and DCIS, is more subtle and will 
likely continue to require conventional histological analysis. Nonetheless, we have 
demonstrated that CARS imaging is reliable, sensitive and specific in discriminating between 
different subtypes of breast cancer, e.g., non-invasive in situ vs. invasive, different histological 
subtypes, e.g., DCIS vs. IG-IDC & ILC, and different histological grades of carcinoma, e.g., 
intermediate vs. high grade. Because they have direct impact on prognosis, choice of treatment 
modalities and monitoring response to therapy, such distinctions are critical. 

The detailed reasons for further separating cancer subtypes from each other are as follows. 
1) Separating in situ carcinoma from invasive carcinoma: in situ carcinomas have an excellent 
prognosis and are generally treated with lumpectomy and sometimes radiation, whereas 
invasive carcinomas have poorer prognosis and are generally treated with surgery 
(lumpectomy or mastectomy with or without lymph node removal), chemotherapy, and 
sometimes radiation. 2) Separating ILC from IDC: Rates of mastectomy compared to breast- 
conserving surgery in ILC are slightly higher than for IDC [45], and ILC is also not a good 
candidate for neoadjuvant chemotherapy because pathologic complete response-rates are 
much lower for ILC (3%) than for IDC (15%) [46]. Two large series with long follow-up 
observation [47,48] have revealed trends showing that the prognosis of ILC in the early years 
is somewhat better than the prognosis for IDC, while this trend is reversed in later years. That 
is, after about 6 years, relapse of ILC catches up with IDC. 3) Separating IG-IDC from HG- 
IDC: High-grade means that tumor cells are poorly differentiated in the Bloom-Richardson 
grading system, and poorly differentiated cancers have a worse prognosis. Patients with poor 
prognosis are usually offered more aggressive treatment, such as extensive mastectomy and 
one or more chemotherapy drugs, while patients with a good prognosis are usually offered 
less invasive treatments, such as lumpectomy and radiation or hormone therapy. 

Sometimes identifying cancer subtypes is difficult using the CARS technique because the 
histological features may be difficult to observe based on the limited field of view in the 
CARS image. As an adjunctive diagnostic approach, the quantitative analysis of cancer cells 
facilitates more accurate identification of cancer subtypes. To enable the implementation of 
this quantitative approach, cell nucleus segmentation was performed, followed by extraction 
of seven pathology-related features with 5 evaluation indexes, a total of 35 features, to 
describe each image. The distributions of seven features for four subtypes of breast cancer are 
shown in Fig. 5, which indicates differences among subtypes. Moreover, the global spatial 
distributions of four subtypes using the 35-feature set under PLSR analysis are shown in Fig. 
6, and the results show the robustness of the algorithms in separating cancer subtypes. Finally, 
a quantitative analysis of the differential diagnosis of cancer subtypes was conducted, and the 
results show high accuracies for the separation of cancer subtypes. 

As a future study direction, 3D imaging and differential diagnosis of breast cancer using 
CARS microscopy is attractive. It can provide more information than 2D images, and allow 
tracking of features from different levels to identify 3D architecture and low contrast 
structures that are difficult to appreciate from single images [49]. The 3D imaging capability 
of the CARS technique makes this aim achievable based on its nonlinear nature. Nonetheless, 
prospective studies with a larger sample size are necessary for subtypes of cancer (DCIS in 
particular) to further evaluate the efficacy of our method. Current study is still limited by the 
number of samples and might experience a larger bias. 
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5. Conclusion 



We demonstrated that, for the first time, the feasibility of integrating label-free CARS 
microscopy and quantitative data analysis to classify breast cancer from normal tissue and 
benign proliferative lesion, as well as further separate cancer subtypes. This study suggests 
that quantitative CARS microscopy has the potential to be used as a routine examination tool 
to rapidly identify breast cancer ex vivo. For future studies, the label-free and fast imaging 
properties of CARS could propel this technique to become a non-invasive approach for in vivo 
and real-time diagnosis of breast cancer without the need for histological staining or 
administration of exogenous contrast agents. Although conventional histological analysis 
would remain the gold standard and would remain necessary for difficult cases requiring the 
analysis of subtle pathologic features or immunohistochemistry markers, the fact that CARS 
seems to be able to delineate major diagnostic entities shows its promise to greatly increase 
the amount of information timely available to patients and physicians during biopsy or 
excision procedures. 
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